Large-Scale Knowledge Acquisition from Botanical Texts
نویسندگان
چکیده
Free text botanical descriptions contained in printed floras can provide a wealth of valuable scientific information. In spite of this richness, these texts have seldom been analyzed on a large scale using NLP techniques. To fill this gap, we describe how we managed to extract a set of terminological resources by parsing a large corpus of botanical texts. The tools and techniques used are presented as well as the rationale for favoring a deep parsing approach coupled with error mining methods over a simple pattern matching approach.
منابع مشابه
Automatic Knowledge Acquisition by Semantic Analysis and Assimilation of Textual Information
Automatic knowledge acquisition is one of the bottlenecks in artificial intelligence and large-scale applications of natural language processing (NLP). There are many efforts to create large knowledge bases (KBs) or to automatically derive knowledge from large text corpora. On the one hand, we meet KBs like CYC, where a tremendous amount of work has been invested by knowledge enterers who have ...
متن کاملThe Cognitive and Social Grounding of Large-Scale Knowledge Resources
We describe the general approach of a sub-project seeking to develop cognitively and socially adequate knowledge resources. Specifically, the present paper outlines a text file acquisition system that (a) allows any users to submit their digitized versions of literary texts, (b) improve their contributions at any later opportunity, and (c) encourages all users to evaluate contributed text files...
متن کاملKleo: A Bootstrapping Learning-by-Reading System
KLEO is a bootstrapping learning-by-reading system that builds a knowledge base in a fully automated way by reading texts for a domain. KLEO’s initial knowledge base is a small knowledge base that consists of domain independent knowledge and KLEO expands the knowledge base with the information extracted from texts. A key facility in KLEO is knowledge integration which combines new information g...
متن کاملIntegrating Natural Language, Knowledge Representation and Reasoning, and Analogical Processing to Learn by Reading
•radically change the economics of building large knowledge bases •provide a platform for cognitive simulations of larger-scale phenomena •Learning Reader learns by reading simplified language texts •Manages syntactic complexity •Unconstrained vocabulary, unlike controlled languages •Learning Reader combines •Natural language processing •A large-scale knowledge base •Deductive reasoning •Analog...
متن کاملCombining NLP and statistical techniques for lexical acquisition
The growing availability of large on-line corpora encourages the study of word behaviour directly from accessible raw texts. However the methods by which lexical knowledge should be extracted from plain texts are still matter of debate and experimentation. In this paper it is presented an integrated tool for lexical acquisition from corpora, ARIOSTO, based on a hybrid methodology that combines ...
متن کامل